# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='...', project_access_token='...')

Cleaning NOAA Weather Data of JFK Airport (New York)¶

This notebook relates to the NOAA Weather Dataset - JFK Airport (New York). The dataset contains 114,546 hourly observations of 12 local climatological variables (such as temperature and wind speed) collected at JFK airport. This dataset can be obtained for free from the IBM Developer Data Asset Exchange.

In this notebook, we clean the raw dataset by:

removing redundant columns and preserving only key numeric columns
converting and cleaning data where required
creating a fixed time interval between observations (this aids with later time-series analysis)
filling missing values
encoding certain weather features

0. Prerequisites¶

Before you run this notebook complete the following steps:

Insert a project token
Import required modules

Insert a project token¶

When you import this project from the Watson Studio Gallery, a token should be automatically generated and inserted at the top of this notebook as a code cell such as the one below:

# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='YOUR_PROJECT_ID', project_access_token='YOUR_PROJECT_TOKEN')
pc = project.project_context

If you do not see the cell above, follow these steps to enable the notebook to access the dataset from the project's resources:

Click on More -> Insert project token in the top-right menu section

ws-project.mov

This should insert a cell at the top of this notebook similar to the example given above.

If an error is displayed indicating that no project token is defined, follow these instructions.
Run the newly inserted cell before proceeding with the notebook execution below

Import required modules¶

Import and configure the required modules.

# Define required imports
import pandas as pd
import numpy as np
import sys
import re
# These set pandas max column and row display in the notebook
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 50)

1. Read the Raw Data¶

We start by reading in the raw dataset, displaying the first few rows of the dataframe, and taking a look at the columns and column types present.

# define filename
DATA_PATH = 'jfk_weather.csv'

def get_file_handle(fname):
    # Project data path for the raw data file
    data_path = project.get_file(fname)
    data_path.seek(0)
    return data_path

# Using pandas to read the data 
# Since the `DATE` column consists date-time information, we use Pandas parse_dates keyword for easier data processing
data_path = get_file_handle(DATA_PATH)
raw_data = pd.read_csv(data_path, parse_dates=['DATE'])
raw_data.head()

/opt/conda/envs/Python36/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (8,10,11,12,13,14,15,16,17,20,22,23,25,40,63,64,67,68,69,84,85,86,87) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

raw_data.dtypes

STATION                                      object
STATION_NAME                                 object
ELEVATION                                   float64
LATITUDE                                    float64
LONGITUDE                                   float64
DATE                                 datetime64[ns]
REPORTTPYE                                   object
HOURLYSKYCONDITIONS                          object
HOURLYVISIBILITY                             object
HOURLYPRSENTWEATHERTYPE                      object
HOURLYDRYBULBTEMPF                           object
HOURLYDRYBULBTEMPC                           object
HOURLYWETBULBTEMPF                           object
HOURLYWETBULBTEMPC                           object
HOURLYDewPointTempF                          object
HOURLYDewPointTempC                          object
HOURLYRelativeHumidity                       object
HOURLYWindSpeed                              object
HOURLYWindDirection                          object
HOURLYWindGustSpeed                         float64
HOURLYStationPressure                        object
HOURLYPressureTendency                      float64
HOURLYPressureChange                         object
HOURLYSeaLevelPressure                       object
HOURLYPrecip                                 object
                                          ...      
MonthlyGreatestPrecip                       float64
MonthlyGreatestPrecipDate                   float64
MonthlyGreatestSnowfall                      object
MonthlyGreatestSnowfallDate                  object
MonthlyGreatestSnowDepth                     object
MonthlyGreatestSnowDepthDate                float64
MonthlyDaysWithGT90Temp                     float64
MonthlyDaysWithLT32Temp                     float64
MonthlyDaysWithGT32Temp                     float64
MonthlyDaysWithLT0Temp                      float64
MonthlyDaysWithGT001Precip                  float64
MonthlyDaysWithGT010Precip                  float64
MonthlyDaysWithGT1Snow                      float64
MonthlyMaxSeaLevelPressureValue             float64
MonthlyMaxSeaLevelPressureDate                int64
MonthlyMaxSeaLevelPressureTime                int64
MonthlyMinSeaLevelPressureValue             float64
MonthlyMinSeaLevelPressureDate                int64
MonthlyMinSeaLevelPressureTime                int64
MonthlyTotalHeatingDegreeDays                object
MonthlyTotalCoolingDegreeDays                object
MonthlyDeptFromNormalHeatingDD               object
MonthlyDeptFromNormalCoolingDD               object
MonthlyTotalSeasonToDateHeatingDD           float64
MonthlyTotalSeasonToDateCoolingDD           float64
Length: 90, dtype: object

2. Clean the Data¶

As you can see above, there are a lot of fields which are non-numerical - usually these will be fields that contain text or categorical data, e.g. HOURLYSKYCONDITIONS.

There are also fields - such as the main temperature field of interest HOURLYDRYBULBTEMPF - that we expect to be numerical, but are instead object type. This often indicates that there may be missing (or null) values, or some other unusual readings that we may have to deal with (since otherwise the field would have been fully parsed as a numerical data type).

In addition, some fields relate to hourly observations, while others relate to daily or monthly intervals. For purposes of later exploratory data analysis, we will restrict the dataset to a certain subset of numerical fields that relate to hourly observations.

In this section, we refer to the NOAA Local Climatological Data Documentation to describe the fields and meaning of various values.

2.1 Select data columns¶

First, we select only the subset of data columns of interest and inspect the column types.

# Choose what columns to import from raw data
column_subset = [
    'DATE',
    'HOURLYVISIBILITY',
    'HOURLYDRYBULBTEMPF',
    'HOURLYWETBULBTEMPF',
    'HOURLYDewPointTempF',
    'HOURLYRelativeHumidity',
    'HOURLYWindSpeed',
    'HOURLYWindDirection',
    'HOURLYStationPressure',
    'HOURLYPressureTendency',
    'HOURLYSeaLevelPressure',
    'HOURLYPrecip',
    'HOURLYAltimeterSetting'
]

# Filter dataset to relevant columns
hourly_data = raw_data[column_subset]
# Set date index
hourly_data = hourly_data.set_index(pd.DatetimeIndex(hourly_data['DATE']))
hourly_data.drop(['DATE'], axis=1, inplace=True)
hourly_data.replace(to_replace='*', value=np.nan, inplace=True)
hourly_data.head()

hourly_data.dtypes

HOURLYVISIBILITY           object
HOURLYDRYBULBTEMPF         object
HOURLYWETBULBTEMPF         object
HOURLYDewPointTempF        object
HOURLYRelativeHumidity     object
HOURLYWindSpeed            object
HOURLYWindDirection        object
HOURLYStationPressure      object
HOURLYPressureTendency    float64
HOURLYSeaLevelPressure     object
HOURLYPrecip               object
HOURLYAltimeterSetting     object
dtype: object

2.2 Clean up precipitation column¶

From the dataframe preview above, we can see that the column HOURLYPrecip - which is the hourly measure of precipitation levels - contains both NaN and T values. T specifies trace amounts of precipitation, while NaN means not a number, and is used to denote missing values.

We can also inspect the unique values present for the field.

hourly_data['HOURLYPrecip'].unique()

array(['0.01', nan, '0.02', 'T', '0.00', '0.10', '0.07', '0.03', '0.05',
       '0.15', '0.06', '0.08', '0.04', '0.09', '0.11', '0.20', '0.13',
       '0.14', '0.05s', '0.12', '0.24', '0.02s', '0.17', '0.16', '0.47',
       '0.28', '0.32', '0.19', '0.22', '0.18', '0.26', '0.23', '0.30',
       '0.33', '0.04s', '0.36', '0.40', '0.41', '0.51', '0.50', '0.68s',
       '0.21', '0.35', '0.54', '0.01s', '0.34', '0.94', '1.14', '1.18',
       '1.15', '0.65', '0.74', '0.39', '0.67', '0.29', '0.46', '0.87',
       '1.10', '0.52', '0.08s', '0.37', '0.42', '0.03s', '0.09s', '0.11s',
       '0.10s', '0.25', '0.33s', '0.13s', '0.38', '0.64s', '0.06s',
       '0.17s', '0.69s', '0.73', '2.41', '0.57', '0.84', '0.86', '0.27',
       '0.60', '0.45', '0.75', '1.26', '0.59', '0.68', '0.79', '0.70',
       '0.44', '0.43', '0.53', '0.37s', '0.77', '0.85', '0.93', '0.55',
       '0.56', '1.52', '1.05', '1.76', '0.07s', '0.63', '1.19', '0.12s',
       '0.18s', '0.58', '0.48', '0.66', '0.29s', '0.71', '0.20s', '0.31',
       '0.020.01s', '0.62', '0.49', '0.69', '0.64', '0.80', '1.54',
       '0.50s', '0.24s', '0.61', '0.98', '1.78', '1.67', '0.26s', '0.19s',
       '0.48s', '0.36s', '0.28s', '1.45', '0.92', '0.95', '0.16s',
       '0.15s', '0.58s', '1.07', '1.37', '0.14s', '0.25s', '0.40s',
       '1.16'], dtype=object)

We can see that some values end with an s (indicating snow), while there is a strange value 0.020.01s which appears to be an error of some sort. To deal with T values, we will set the observation to be 0. We will also replace the erroneous value 0.020.01s with NaN.

# Fix imported data
hourly_data['HOURLYPrecip'].replace(to_replace='T', value='0.00', inplace=True)
hourly_data['HOURLYPrecip'].replace('0.020.01s', np.nan, inplace=True)

2.3 Convert columns to numerical types¶

Next, we will convert string columns that refer to numerical values to numerical types. For columns such as HOURLYPrecip, we first also drop the non-numerical parts of the value (the s character).

# Set of columns to convert
messy_columns = column_subset[1:]

# Convert columns to float32 datatype
for i in messy_columns:
    hourly_data[i] = hourly_data[i].apply(lambda x: re.sub('[^0-9,.-]', '', x) if type(x) == str else x).replace('', np.nan).astype(('float32'))

We can now see that all fields have numerical data type.

print(hourly_data.info())
print()
hourly_data.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 114545 entries, 2010-01-01 00:51:00 to 2018-07-27 23:59:00
Data columns (total 12 columns):
HOURLYVISIBILITY          96464 non-null float32
HOURLYDRYBULBTEMPF        111403 non-null float32
HOURLYWETBULBTEMPF        111245 non-null float32
HOURLYDewPointTempF       111395 non-null float32
HOURLYRelativeHumidity    111395 non-null float32
HOURLYWindSpeed           111358 non-null float32
HOURLYWindDirection       110444 non-null float32
HOURLYStationPressure     111256 non-null float32
HOURLYPressureTendency    48038 non-null float32
HOURLYSeaLevelPressure    99522 non-null float32
HOURLYPrecip              79762 non-null float32
HOURLYAltimeterSetting    86878 non-null float32
dtypes: float32(12)
memory usage: 6.1 MB
None

2.4 Reformat and process data¶

Next, we will clean up some of the data columns to ensure their values fall within the parameters defined by the NOAA documentation (referred to above).

# Generate the summary statistics for each column
hourly_data.describe()

According to the documentation, the HOURLYPressureTendency field should be an integer value in the range [0, 8]. Let's check if this condition holds for this dataset.

# Check if categorical variable HOURLYPressureTendency ever has a non-integer entry outside the bounds of 0-8
cond = len(hourly_data[~hourly_data['HOURLYPressureTendency'].isin(list(range(0,9)) + [np.nan])])
print('Hourly Pressure Tendency should be between 0 and 8: {}'.format(cond == 0))

Hourly Pressure Tendency should be between 0 and 8: True

The HOURLYVISIBILITY should be an integer in the range [0, 10]. Let's check this condition too.

# Hourly Visibility should be between 0 and 10
hourly_data[(hourly_data['HOURLYVISIBILITY'] < 0) | (hourly_data['HOURLYVISIBILITY'] > 10)]

We find that a couple of observations fall outside the range. These must be spurious data observations and we handle them by replacing them with NaN.

# Replace any hourly visibility figure outside these bounds with nan
hourly_data.loc[hourly_data['HOURLYVISIBILITY'] > 10, 'HOURLYVISIBILITY'] = np.nan

# Hourly Visibility should be between 0 and 10
cond = len(hourly_data[(hourly_data['HOURLYVISIBILITY'] < 0) | (hourly_data['HOURLYVISIBILITY'] > 10)])
print('Hourly Visibility should be between 0 and 10: {}'.format(cond == 0))

Hourly Visibility should be between 0 and 10: True

Finally, we check if there are any duplicates with respect to our DATE index and check furthermore that our dates are in the correct order (that is, strictly increasing).

cond = len(hourly_data[hourly_data.index.duplicated()].sort_index())
print('Date index contains no duplicate entries: {}'.format(cond == 0))

Date index contains no duplicate entries: True

# Make sure time index is sorted and increasing
print('Date index is strictly increasing: {}'.format(hourly_data.index.is_monotonic_increasing))

Date index is strictly increasing: True

2.5 Create a fixed interval dataset¶

Most time-series analysis requires (or certainly works much better with) data that has fixed measurement intervals. As you may have noticed from the various data samples above, the measurement intervals for this dataset are not exactly hourly. So, we will use Pandas' resampling functionality to create a dataset that has exact hourly measurement intervals.

# Resample (downsample) to hourly rows (we're shifting everything up by 9 minutes!)
hourly_data = hourly_data.resample('60min').last().shift(periods=1) #Note: use resample('60min', base=51) to resample on the 51st of every hour

We will now also replace missing values. For numerical values, we will linearly interpolate between the previous and next valid obvservations. For the categorical HOURLYPressureTendency field, we will replace missing values with the last valid observation.

hourly_data['HOURLYPressureTendency'] = hourly_data['HOURLYPressureTendency'].fillna(method='ffill') # fill with last valid observation
hourly_data = hourly_data.interpolate(method='linear') # interpolate missing values
hourly_data.drop(hourly_data.index[0], inplace=True) # drop first row

print(hourly_data.info())
print()
hourly_data.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 75119 entries, 2010-01-01 01:00:00 to 2018-07-27 23:00:00
Freq: 60T
Data columns (total 12 columns):
HOURLYVISIBILITY          75119 non-null float32
HOURLYDRYBULBTEMPF        75119 non-null float32
HOURLYWETBULBTEMPF        75119 non-null float32
HOURLYDewPointTempF       75119 non-null float32
HOURLYRelativeHumidity    75119 non-null float32
HOURLYWindSpeed           75119 non-null float32
HOURLYWindDirection       75119 non-null float32
HOURLYStationPressure     75119 non-null float32
HOURLYPressureTendency    75119 non-null float32
HOURLYSeaLevelPressure    75119 non-null float32
HOURLYPrecip              75119 non-null float32
HOURLYAltimeterSetting    75119 non-null float32
dtypes: float32(12)
memory usage: 4.0 MB
None

2.6 Feature encoding¶

The final pre-processing step we will perform will be to handle two of our columns in a special way in order to correctly encode these features. They are:

HOURLYWindDirection - wind direction
HOURLYPressureTendency - an indicator of pressure changes

For HOURLYWindDirection, we encode the raw feature value as two new values, which measure the cyclical nature of wind direction - that is, we are encoding the compass-point nature of wind direction measurements.

# Transform HOURLYWindDirection into a cyclical variable using sin and cos transforms
hourly_data['HOURLYWindDirectionSin'] = np.sin(hourly_data['HOURLYWindDirection']*(2.*np.pi/360))
hourly_data['HOURLYWindDirectionCos'] = np.cos(hourly_data['HOURLYWindDirection']*(2.*np.pi/360))
hourly_data.drop(['HOURLYWindDirection'], axis=1, inplace=True)

For HOURLYPressureTendency, the feature value is in fact a categorical feature with three levels:

0-3 indicates an increase in pressure over the previous 3 hours
4 indicates no change during the previous 3 hours
5-8 indicates a decrease over the previous 3 hours

Hence, we encode this feature into 3 dummy values representing these 3 potential states.

# Transform HOURLYPressureTendency into 3 dummy variables based on NOAA documentation
hourly_data['HOURLYPressureTendencyIncr'] = [1.0 if x in [0,1,2,3] else 0.0 for x in hourly_data['HOURLYPressureTendency']] # 0 through 3 indicates an increase in pressure over previous 3 hours
hourly_data['HOURLYPressureTendencyDecr'] = [1.0 if x in [5,6,7,8] else 0.0 for x in hourly_data['HOURLYPressureTendency']] # 5 through 8 indicates a decrease over previous 3 hours
hourly_data['HOURLYPressureTendencyConst'] = [1.0 if x == 4 else 0.0 for x in hourly_data['HOURLYPressureTendency']] # 4 indicates no change during previous 3 hours
hourly_data.drop(['HOURLYPressureTendency'], axis=1, inplace=True)
hourly_data['HOURLYPressureTendencyIncr'] = hourly_data['HOURLYPressureTendencyIncr'].astype(('float32'))
hourly_data['HOURLYPressureTendencyDecr'] = hourly_data['HOURLYPressureTendencyDecr'].astype(('float32'))
hourly_data['HOURLYPressureTendencyConst'] = hourly_data['HOURLYPressureTendencyConst'].astype(('float32'))

2.7 Rename columns¶

Before saving the dataset, we will rename the columns for readability.

hourly_data.columns

Index(['HOURLYVISIBILITY', 'HOURLYDRYBULBTEMPF', 'HOURLYWETBULBTEMPF',
       'HOURLYDewPointTempF', 'HOURLYRelativeHumidity', 'HOURLYWindSpeed',
       'HOURLYStationPressure', 'HOURLYSeaLevelPressure', 'HOURLYPrecip',
       'HOURLYAltimeterSetting', 'HOURLYWindDirectionSin',
       'HOURLYWindDirectionCos', 'HOURLYPressureTendencyIncr',
       'HOURLYPressureTendencyDecr', 'HOURLYPressureTendencyConst'],
      dtype='object')

# define the new column names
columns_new_name = [
    'visibility',
    'dry_bulb_temp_f',
    'wet_bulb_temp_f',
    'dew_point_temp_f',
    'relative_humidity',
    'wind_speed',
    'station_pressure',
    'sea_level_pressure',
    'precip',
    'altimeter_setting',
    'wind_direction_sin',
    'wind_direction_cos',
    'pressure_tendency_incr',
    'pressure_tendency_decr',
    'pressure_tendency_const'
]

columns_name_map = {c:columns_new_name[i] for i, c in enumerate(hourly_data.columns)}

hourly_data_renamed = hourly_data.rename(columns=columns_name_map)

print(hourly_data_renamed.info())
print()
hourly_data_renamed.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 75119 entries, 2010-01-01 01:00:00 to 2018-07-27 23:00:00
Freq: 60T
Data columns (total 15 columns):
visibility                 75119 non-null float32
dry_bulb_temp_f            75119 non-null float32
wet_bulb_temp_f            75119 non-null float32
dew_point_temp_f           75119 non-null float32
relative_humidity          75119 non-null float32
wind_speed                 75119 non-null float32
station_pressure           75119 non-null float32
sea_level_pressure         75119 non-null float32
precip                     75119 non-null float32
altimeter_setting          75119 non-null float32
wind_direction_sin         75119 non-null float64
wind_direction_cos         75119 non-null float64
pressure_tendency_incr     75119 non-null float32
pressure_tendency_decr     75119 non-null float32
pressure_tendency_const    75119 non-null float32
dtypes: float32(13), float64(2)
memory usage: 5.4 MB
None

# Explore some general information about the dataset
print('# of megabytes held by dataframe: ' + str(round(sys.getsizeof(hourly_data_renamed) / 1000000,2)))
print('# of features: ' + str(hourly_data_renamed.shape[1])) 
print('# of observations: ' + str(hourly_data_renamed.shape[0]))
print('Start date: ' + str(hourly_data_renamed.index[0]))
print('End date: ' + str(hourly_data_renamed.index[-1]))
print('# of days: ' + str((hourly_data_renamed.index[-1] - hourly_data_renamed.index[0]).days))
print('# of months: ' + str(round((hourly_data_renamed.index[-1] - hourly_data_renamed.index[0]).days/30,2)))
print('# of years: ' + str(round((hourly_data_renamed.index[-1] - hourly_data_renamed.index[0]).days/365,2)))

# of megabytes held by dataframe: 5.71
# of features: 15
# of observations: 75119
Start date: 2010-01-01 01:00:00
End date: 2018-07-27 23:00:00
# of days: 3129
# of months: 104.3
# of years: 8.57

3. Save the Cleaned Data¶

Finally, we save the cleaned dataset as a Project asset for later re-use. You should see an output like the one below if successful:

{'file_name': 'jfk_weather_cleaned.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'jfkweatherdata-donotdelete-pr-...',
 'asset_id': '...'}

Note: In order for this step to work, your project token (see the first cell of this notebook) must have Editor role. By default this will overwrite any existing file.

project.save_data("jfk_weather_cleaned.csv", hourly_data_renamed.to_csv(float_format='%g'), overwrite=True)

{'file_name': 'jfk_weather_cleaned.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'daxweatherproject-donotdelete-pr-a99epzqcuc5or2',
 'asset_id': 'a05ae92f-c315-41d7-8402-f1b78e85eb63'}

Next steps¶

Close this notebook.
Open the Part 2 - Data Analysis notebook to explore the cleaned dataset.

Authors¶

This notebook was created by the Center for Open-Source Data & AI Technologies.

Love this notebook? Don't have an account yet?
Share it with your colleagues and help them discover the power of Watson Studio! Sign Up

	HOURLYVISIBILITY	HOURLYDRYBULBTEMPF	HOURLYWETBULBTEMPF	HOURLYDewPointTempF	HOURLYRelativeHumidity	HOURLYWindSpeed	HOURLYWindDirection	HOURLYStationPressure	HOURLYPressureTendency	HOURLYSeaLevelPressure	HOURLYPrecip	HOURLYAltimeterSetting
DATE
2010-01-01 00:51:00	6.00	33.0	32.0	31.0	92.0	0.0	0.0	29.969999	8.0	29.99	0.01	29.99
2010-01-01 01:00:00	5.59	33.0	32.0	31.0	92.0	0.0	0.0	29.959999	8.0	29.99	NaN	NaN
2010-01-01 01:51:00	6.00	33.0	33.0	32.0	96.0	0.0	0.0	29.969999	NaN	29.99	0.02	29.99
2010-01-01 02:03:00	6.00	34.0	33.0	32.0	93.0	0.0	0.0	29.969999	NaN	NaN	0.00	29.99
2010-01-01 02:28:00	5.00	34.0	33.0	32.0	93.0	0.0	0.0	29.969999	NaN	NaN	0.00	29.99

	HOURLYVISIBILITY	HOURLYDRYBULBTEMPF	HOURLYWETBULBTEMPF	HOURLYDewPointTempF	HOURLYRelativeHumidity	HOURLYWindSpeed	HOURLYWindDirection	HOURLYStationPressure	HOURLYPressureTendency	HOURLYSeaLevelPressure	HOURLYPrecip	HOURLYAltimeterSetting
count	96464.000000	111403.000000	111245.000000	111395.000000	111395.000000	111358.000000	110444.000000	111256.000000	48038.000000	99522.000000	79762.000000	86878.000000
mean	8.758680	55.310612	49.703098	43.309196	67.132141	11.268045	192.511597	29.990612	4.388963	30.026096	0.007032	30.009375
std	2.743144	17.169971	16.061544	19.356941	20.278189	6.175224	107.608009	0.238463	2.741891	0.233694	0.042610	0.239828
min	0.000000	1.000000	-1.000000	-22.000000	8.000000	0.000000	0.000000	28.490000	0.000000	28.540001	0.000000	28.510000
25%	9.940000	42.000000	37.000000	29.000000	51.000000	7.000000	110.000000	29.850000	2.000000	29.879999	0.000000	29.870001
50%	10.000000	56.000000	50.000000	45.000000	69.000000	10.000000	200.000000	29.990000	5.000000	30.020000	0.000000	30.010000
75%	10.000000	70.000000	64.000000	60.000000	85.000000	15.000000	280.000000	30.139999	7.000000	30.170000	0.000000	30.160000
max	99.419998	102.000000	85.000000	84.000000	100.000000	53.000000	360.000000	30.830000	8.000000	30.850000	2.410000	30.850000

	HOURLYVISIBILITY	HOURLYDRYBULBTEMPF	HOURLYWETBULBTEMPF	HOURLYDewPointTempF	HOURLYRelativeHumidity	HOURLYWindSpeed	HOURLYWindDirection	HOURLYStationPressure	HOURLYPressureTendency	HOURLYSeaLevelPressure	HOURLYPrecip	HOURLYAltimeterSetting
DATE
2011-10-16 11:51:00	14.000000	68.0	53.0	37.0	33.0	16.0	250.0	29.85	NaN	NaN	NaN	29.870001
2015-06-21 17:45:00	99.419998	79.0	72.0	68.0	70.0	37.0	310.0	29.74	NaN	NaN	0.0	29.760000

	HOURLYVISIBILITY	HOURLYDRYBULBTEMPF	HOURLYWETBULBTEMPF	HOURLYDewPointTempF	HOURLYRelativeHumidity	HOURLYWindSpeed	HOURLYWindDirection	HOURLYStationPressure	HOURLYPressureTendency	HOURLYSeaLevelPressure	HOURLYPrecip	HOURLYAltimeterSetting
DATE
2010-01-01 01:00:00	6.0	33.0	32.0	31.0	92.0	0.0	0.0	29.969999	8.0	29.990000	0.01	29.990000
2010-01-01 02:00:00	6.0	33.0	33.0	32.0	96.0	0.0	0.0	29.969999	8.0	29.990000	0.02	29.990000
2010-01-01 03:00:00	5.0	33.0	33.0	32.0	96.0	0.0	0.0	29.969999	8.0	29.990000	0.00	29.990000
2010-01-01 04:00:00	5.0	33.0	33.0	32.0	96.0	0.0	0.0	29.950001	8.0	29.969999	0.00	29.969999
2010-01-01 05:00:00	5.0	33.0	32.0	31.0	92.0	0.0	0.0	29.930000	8.0	29.959999	0.00	29.950001

	STATION	STATION_NAME	ELEVATION	LATITUDE	LONGITUDE	DATE	REPORTTPYE	HOURLYSKYCONDITIONS	HOURLYVISIBILITY	HOURLYPRSENTWEATHERTYPE	HOURLYDRYBULBTEMPF	HOURLYDRYBULBTEMPC	HOURLYWETBULBTEMPF	HOURLYWETBULBTEMPC	HOURLYDewPointTempF	HOURLYDewPointTempC	HOURLYRelativeHumidity	HOURLYWindGustSpeed	HOURLYStationPressure	HOURLYPressureTendency	HOURLYPressureChange	HOURLYSeaLevelPressure	HOURLYPrecip	...	MonthlyGreatestPrecip	MonthlyGreatestPrecipDate	MonthlyGreatestSnowfall	MonthlyGreatestSnowfallDate	MonthlyGreatestSnowDepth	MonthlyGreatestSnowDepthDate	MonthlyDaysWithGT90Temp	MonthlyDaysWithLT32Temp	MonthlyDaysWithGT32Temp	MonthlyDaysWithLT0Temp	MonthlyDaysWithGT001Precip	MonthlyDaysWithGT010Precip	MonthlyDaysWithGT1Snow	MonthlyMaxSeaLevelPressureValue	MonthlyMaxSeaLevelPressureDate	MonthlyMaxSeaLevelPressureTime	MonthlyMinSeaLevelPressureValue	MonthlyMinSeaLevelPressureDate	MonthlyMinSeaLevelPressureTime	MonthlyTotalHeatingDegreeDays	MonthlyTotalCoolingDegreeDays	MonthlyDeptFromNormalHeatingDD	MonthlyDeptFromNormalCoolingDD	MonthlyTotalSeasonToDateHeatingDD	MonthlyTotalSeasonToDateCoolingDD
0	WBAN:94789	JFK INTERNATIONAL AIRPORT NY US	3.4	40.6386	-73.7622	2010-01-01 00:51:00	FM-15	FEW:02 7 SCT:04 13 BKN:07 29	6.00	-RA:02 PL:06 BR:1 \|RA:61 PL:74 \|RA:61 PL:79	33	0.6	32	0.1	31	-0.6	92	NaN	29.97	8.0	NaN	29.99	0.01	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-9999	-9999	NaN	-9999	-9999	NaN	NaN	NaN	NaN	NaN	NaN
1	WBAN:94789	JFK INTERNATIONAL AIRPORT NY US	3.4	40.6386	-73.7622	2010-01-01 01:00:00	FM-12	NaN	5.59	\|\|PL:79	33	0.6	32	0.1	31	-0.6	92	NaN	29.96	8.0	+0.05	29.99	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-9999	-9999	NaN	-9999	-9999	NaN	NaN	NaN	NaN	NaN	NaN
2	WBAN:94789	JFK INTERNATIONAL AIRPORT NY US	3.4	40.6386	-73.7622	2010-01-01 01:51:00	FM-15	FEW:02 6 SCT:04 13 OVC:08 35	6.00	-RA:02 PL:06 BR:1 \|RA:61 PL:74 \|RA:61 PL:79	33	0.6	33	0.3	32	0.0	96	NaN	29.97	NaN	NaN	29.99	0.02	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-9999	-9999	NaN	-9999	-9999	NaN	NaN	NaN	NaN	NaN	NaN
3	WBAN:94789	JFK INTERNATIONAL AIRPORT NY US	3.4	40.6386	-73.7622	2010-01-01 02:03:00	FM-16	FEW:02 6 SCT:04 13 OVC:08 35	6.00	-RA:02 BR:1 \|RA:61 \|RA:61	34	1	33	0.7	32	0.0	93	NaN	29.97	NaN	NaN	NaN	T	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-9999	-9999	NaN	-9999	-9999	NaN	NaN	NaN	NaN	NaN	NaN
4	WBAN:94789	JFK INTERNATIONAL AIRPORT NY US	3.4	40.6386	-73.7622	2010-01-01 02:28:00	FM-16	BKN:07 7 BKN:07 15 OVC:08 35	5.00	-RA:02 BR:1 \|RA:61 \|RA:61	34	1	33	0.7	32	0.0	93	NaN	29.97	NaN	NaN	NaN	T	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-9999	-9999	NaN	-9999	-9999	NaN	NaN	NaN	NaN	NaN	NaN